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Abstract. Over the past few years, symmetric positive definite matrices (SPD) 
have been receiving considerable attention from computer vision community. 
Though various distance measures have been proposed in the past for compar¬ 
ing SPD matrices, the two most widely-used measures are affine-invariant dis¬ 
tance and log-Euclidean distance. This is because these two measures are true 
geodesic distances induced by Riemannian geometry. In this work, we focus on 
the log-Euclidean Riemannian geometry and propose a data-driven approach for 
learning Riemannian metrics/geodesic distances for SPD matrices. We show that 
the geodesic distance learned using the proposed approach performs better than 
various existing distance measures when evaluated on face matching and cluster¬ 
ing tasks. 


- I denotes the identity matrix of appropriate size. 

- ( , ) denotes an inner product. 

- Sn denotes the set of n x n symmetric matrices. 

- denotes the set of n x n symmetric positive definite matrices. 

- TpAi denotes the tangent space to the manifold Ai at the point p € Ai. 

- II IIF denotes the matrix Frobenius norm. 

- Chol(P) denotes the lower triangular matrix obtained from the Cholesky decompo¬ 
sition of a matrix P. 

- expO and log() denote matrix exponential and logarithm respectively. 

- and represent partial derivatives. 
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1 Introduction 

Many computer vision applications involve features that obey specific constraints. Such 
features often lie in non-Euclidean spaces, where the underlying distance metric is not 
the regular I 2 norm. For instance, popular features like shapes, rotation matrices, linear 
subspaces, symmetric positive dehnite (SPD) matrices, etc. are known to lie on Rie- 
mannian manifolds. In such cases, one needs to develop inference techniques that make 
use of the underlying manifold structure. 

Over the past few years, manifolds have been receiving considerable attention from 
the computer vision community. In this work, we focus our attention on the set of SPD 
matrices. Examples of SPD matrices in computer vision include diffusion tensors [1], 
structure tensors [2] and covariance region descriptors [3]. Diffusion tensors arise nat¬ 
urally in medical imaging [1]. In diffusion tensor magnetic resonance imaging (DT- 
MRI), water diffusion in tissues is represented by a diffusion tensor characterizing the 
anisotropy within the tissue. In optical flow estimation and motion segmentation, struc¬ 
ture tensors are often employed to encode important image features, such as texture and 
motion [2]. Covariance region descriptors are used in texture classification [3], object 
detection [4], object tracking, action recognition and face recognition [5]. There are sev¬ 
eral advantages of using covariance matrices as region descriptors. Covariance matrices 
provide a natural way of fusing multiple features which might be correlated. The diag¬ 
onal entries of a covariance matrix represent the variance of individual features and the 
non-diagonal entries represent the cross correlations. The noise corrupting individual 
samples is largely Altered out with an averaging Alter during covariance computation. 
Covariance matrices are low dimensional compared to joint feature histograms. Covari¬ 
ance matrices do not have any information regarding the ordering and the number of 
points. This implies a certain level of scale and rotation invariance over the regions in 
different images. 

Various distance measures have been proposed in the literature for the comparison 
of SPD matrices. Among them, the two most widely-used distance measures are the 
affine-invariant distance [1] and the log-Frobenius distance [6] (also referred to as log- 
Euclidean distance in the literature). The main reason for their popularity is that they 
are geodesic distances induced by Riemannian metrics. 

The log-Euclidean framework [6] proposed by Arsigny et. al. defines a class of Rie¬ 
mannian metrics, rather than a single metric, called log-Euclidean Riemannian metrics. 
According to this framework, any inner product ( , ) defined on = {log(P) | P C 

5'++} = Sn extended to S'++ by left- or right- multiplication is a bi-invariant Rieman¬ 
nian metric. Equipped with this bi-invariant metric, the space of SPD matrices is a flat 
Riemannian space and the geodesic distance corresponding to this bi-invariant Rieman¬ 
nian metric is equal to the distance induced by ( , ) in . Surprisingly, this remark¬ 
able result has not been used by the computer vision community. Since = Sn 

is a vector space, this result allows us to learn log-Euclidean Riemannian metrics and 
corresponding log-Euclidean geodesic distances from the data by using Mahalanobis 
distance learning techniques like information-theoretic metric learning (ITML) [7] and 
large margin nearest neighbor distance learning [8] in 7/5'+^. In this work, we ex¬ 
plore this idea of data driven Riemannian metrics/geodesic distances for the set of SPD 
matrices. For learning Mahalanobis distances in 7/5'++ we use the ITML technique. 
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Organization; In section 2, we provide a brief overview of various distance mea¬ 
sures used in the literature to compare SPD matrices. We briefly explain the ITML 
technique in section 3 and present our approach for learning log-Euclidean Riemannian 
metrics/log-Euclidean geodesic distances from the data in section 4. We provide some 
experimental results in section 5 and conclude the paper in section 6 . 

2 Distances to compare SPD matrices 

Various distance measures have been used in the literature to compare SPD matrices. 
Each distance has been derived from different geometrical, statistical or information- 
theoretic considerations. Though many of these distances try to capture the non-linearity 
of SPD matrices, not all of them are geodesic distances induced by Riemannian metrics. 
Tables 1 and 2 summarize these distances and their properties. Among them, the log- 
Erobenius distance[ 6 ] and the affine-invariant distance[l] are the most popular ones. 

3 Mahalanobis distance learning using ITML 

Information theoretic metric learning [7] is a technique for learning Mahalanobis dis¬ 
tance functions from the data based on similarity and dissimilarity constraints. Let 
be a set of N points in Tl‘^. Given pairs of similar points S and pairs of dissim¬ 
ilar points D, the aim of ITML is to learn an SPD matrix M such that the Mahalanobis 
distance parametrized by M is below a given threshold I for similar pairs of points and 
above a given threshold u for dissimilar pairs of points. 

Let Did denote the LogDet divergence between SPD matrices defined as 

Did{P,Q) = trace(P( 3 “^) - log det(P( 5 "^) - n; P, Q € 5++. ( 1 ) 

ITML formulates the Mahalanobis matrix learning as the following optimization prob¬ 
lem: 

minimize DidiM^Mo) -f 7 £)/d(diag(C), diag(Co)) 

Myo, c 

subject to {xi — XjYM{xi — Xj) < Cc{i,j), j) £ S (2) 

{xi - XjYM{xi -Xj)> V(i, j) £ D, 

where c{i,j) denotes the index of the (i, j)— th constraint, C, is the vector of variables 
Cc(i,j), Co is a vector whose components equal I for similarity constraints and u for 
dissimilarity constraints, Mq is an SPD matrix that captures the prior knowledge about 
M, and 7 is a parameter controlling the tradeoff between satisfying the constraints and 
minimizing Did{M, Mq). This optimization problem can be solved efficiently using 
Bregman iterations. In this work, we use the publicly available ITML code provided by 
the authors of [7]. 

ITML parameters: We need to specify the values for the following parameters 
while using ITML: Mq, 7, I, u. We choose the constraint thresholds I and u as the 
and percentiles of the observed distribution of distances between pairs of points 
within the training dataset. Hence, the parameters for the ITML algorithm are Mq, 7, a 
and b. 
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Table 1; SPD matrix distances and their properties 


Distance 

Formula 

Symmetric 

Triangle 

inequality 

Geodesic 

Frobenius 

IIA-P 2 IIF 

Yes 

Yes 

No 

Cholesky- 
Frobenius [13] 

||Chol(Pl) -Chol(P 2 )||F 

Yes 

Yes 

No 

J-divergence [12] 

iy^trace(PiP2"^ + — 2n 

Yes 

No 

No 

Jensen-Bregman 
LogDet Diver- 
gence[l1] 


Yes 

No 

No 

y^log det (-^4^) “ det (P 1 P 2 ) 

Affine-invariant 

[1] 

lllog IIf 

Yes 

Yes 

Yes 

Log-Frobenius 

[6] 

||log(Pl) - log(P 2 )||F 

Yes 

Yes 

Yes 


Table 2: SPD matrix distances and their properties 


Distance 

Distance 
from Sn 

Affine 

invariance 

Scale 

invariance 

Rotation 

invariance 

Inversion 

invariance 

Frobenius 

Finite 

No 

No 

Yes 

No 

Cholesky-Frobenius 

[13] 

Finite 

No 

No 

No 

No 

J-divergence [12] 

Infinite 

Yes 

Yes 

Yes 

Yes 

Jensen-Bregman 

LogDet Divergence [ 11 ] 

Infinite 

Yes 

Yes 

Yes 

Yes 

Affine-invariant [1] 

Infinite 

Yes 

Yes 

Yes 

Yes 

Log-Frobenius [6] 

Infinite 

No 

Yes 

Yes 

Yes 
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4 Log-Euclidean Riemannian metric learning 

The log-Euclidean framework [6] proposed by Arsigny et. al. defines a class of Rie¬ 
mannian metrics called log-Euclidean metrics. The geodesic distances associated with 
log-Euclidean metrics are called log-Euclidean distances. Let 0 be an operation on SPD 
matrices defined as Pi Q P 2 = exp (log(Pi) -f log(P 2 ))- We have the following result 
based on the log-Euclidean framework introduced in [6]: 

Result 4.1: Any inner product ( , ) defined on 7/5'++ = {log(P) | P G 5++} = 5„ 
extended to the Lie group (5++, ©) by left- or right- multiplication is a bi-invariant 
Riemannian metric. The corresponding geodesic distance between Pi G 5++ and 
P 2 e 5++ is given by 


d(Pi,P 2 ) = ||mlog^(Pi) -mlog^(P 2 )||/ = ||log(Pi) - log(P 2 )||/, (3) 

where 11 11 / is the norm induced by ( , ). Note that here mlogj is the inverse-exponential 
map at the identity matrix which is equal to the usual matrix logarithm in this case. 

The set of all n x n symmetric matrices form a vector space of dimension d = . 

Let vec{P) denote the column vector form of the upper triangular part of a matrix P. 
This vec{) operation provides a d dimensional vector representation for 5„. Let ( , ) be 
an inner product defined on the vector space Sn and M £ 5^"+ be the corresponding 
matrix of inner products between the d basis vectors corresponding to vec{) representa¬ 
tion. Note that ( , ) is uniquely characterized by M. The distance between two matrices 
Pi G Sn and P 2 G 5„ induced by this inner product is given by 

d{Pi,P 2 ) = ('uec(Pi) - vec{P 2 ))^ M (vec(Pi) - vec{P 2 )) ■ (4) 

Result 4.2: Let M G 5^"+ , where d = LilLhli Then, M defines a unique inner 
product denoted by ( , )^ on 7/5++ = {log(P) | P G 5++} = 5„. This inner 
product { , ) M defines a log-Euclidean Riemannian metric which can be obtained 

by simply extending ( , to the Lie group (5++, ©) by left- or right- multiplication. 
The corresponding log-Euclidean geodesic distance between Pi G 5++ and P 2 G 5++ 
is given by 

dM{Pi,P 2 ) = (wec(log(Pi)) - uec(log(P 2 )))^ M (z;ec(log(Pi)) - uec(log(P 2 ))). 

(5) 

The above result follows directly from result 4.1. Result 4.2 says that any Mahalanobis 
distance defined in the vector space {uec(log(P)) | P G 5++} is a geodesic dis¬ 
tance on 5++ and the corresponding Riemannian metric is uniquely defined by the 
Mahalanobis matrix M. Hence, we can learn Riemannian metrics/geodesic distances 
for 5++ from the data by learning Mahalanobis distance functions in the vector space 
{uec(log(P)) I P G 5++}. Table 3 summarizes our approach for leaning geodesic dis¬ 
tances on 5++. In this work, we use ITML technique for Mahalanobis distance learning. 
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Table 3; Algorithm for learning geodesic distances on 
Input: {Pi e S++}!Li 
for i = 1 to N 

Vi = vec{\og{Pi)) 

end 

Learn a Mahalanobis distance function using This gives a Mahalanobis matrix 

M G where d = 

Output: Geodesic distance between Pi and P 2 : 

(wec(log(Pi)) - -!;ec(log(P 2 )))^ M (iiec(log(Pi)) - ■!;ec(log(P 2 ))). 

5 Experiments 

In the section, we evaluate the performance of the proposed Riemannian metric/geodesic 
distance learning approach on two applications: (i) Face matching using Labeled Faces 
in the Wild (LFW) dataset and (ii) Semi-supervised clustering using ETH80 dataset. 


5.1 Face matching using LFW face dataset 

In this experiment our aim is to predict whether a given pair of face images correspond 
to the same person or not. 

Dataset: The LFW dataset [9] is a collection of face photographs designed for studying 
the problem of unconstrained face recognition. This dataset consists of 13233 labeled 
face images of 1680 subjects collected from the web. This dataset consists of two sub¬ 
sets: 

- Development subset: The development subset consists of 2200 training image 
pairs, where 1100 are similar pairs and 1100 are dissimilar pairs, and 1000 test 
image pairs, where 500 are similar pairs and 500 are dissimilar pairs. An image 
pair is said to be similar if both the images correspond to the same person and 
dissimilar if they correspond to different persons. 

- Evaluation subset: The evaluation subset consists of 3000 similar image pairs and 
3000 dissimilar image pairs. It is further divided into 10 subsets each of which 
consists of 300 similar pairs and 300 dissimilar pairs. 

All the image pairs were generated by randomly selecting images from the 13233 im¬ 
ages in the dataset. The development subset is meant for model and parameter selection. 
The evaluation subset should be used only once for final training and testing. To avoid 
overfitting, the image pairs in the development subset were chosen to be different from 
the image pairs in the evaluation subset. 


Feature extractiou We crop the face region in each image and resize it to a 64 x 64 
image. Following [3], we convert each pixel in an image into a 9-dimensional feature 
vector given by 


x, y, R{x,y), G{x,y), B{x,y), 


dW{x,y) 


dW{x,y) 


d‘^W{x,y) 


dx 

5 

dy 

? 

dx'^ 

1 


d'^W{x,y) 
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where x, y are the column and row coordinates respectively, R, G and B are the color 
coordinates and W is the grayscale image. We use the 9x9 covariance matrix of the 
feature vectors to represent the image. 

Experimental protocol Following the standard experimental protocol for this dataset, 
we use the development set for selecting the parameters of ITML and then use the 
evaluation set only once for final training and testing. Following steps summarize our 
experimental procedure: 

- Parameter selection: We train the ITML algorithm using the 2200 training pairs 
of the development subset and then test it on the 1000 test pairs of the development 
subset. We select the ITML parameters that give the best test accuracy. 

- Final training and testing: The evaluation set consists of 10 splits and we perform 
10-fold cross-validation. In each fold, we use 9 splits (2700 similar pairs and 2700 
dissimilar pairs) for training ITML and 1 split (300 similar pairs and 300 dissimilar 
pairs) for testing. For training ITML, we use the parameters that were selected 
in the previous step. Since our task is face matching, we need to threshold the 
learned distance function. In each fold, we find the threshold that gives best training 
accuracy and use the same threshold for test image pairs. 

Comparative methods We compare the performance of the proposed log-Euclidean 
metric learning approach with the following approaches: 

- Directly use any of the following distances for matching: 

• Frobenius, Cholesky-Frobenius, J-divergence, Jensen-Bregman LogDet diver¬ 
gence, Affine-invariant and Log-Frobenius. 

- Use ITML directly with the covariance matrices by treating them as elements of 
the Euclidean space of symmetric matrices. 

- Use ITML with the lower triangular matrix obtained by Cholesky decomposition. 

In all these methods the distance threshold is obtained in each fold independently based 
on the training data. 

Parameters The following parameter values were used for ITML: 

- Mo = /, 7 = 103-5, a = 5, 5 = 95. 

These parameters were selected using the development subset of the dataset. 

Results Tables 4 and 5 summarize the prediction results for various approaches on the 
LEW data set. We can draw the following conclusions from these results: 

- The proposed Riemannian metric/geodesic distance learning approach outperforms 
the other approaches for comparing covariance matrices. 

- The log-Euclidean geodesic distance learned from the data performs much better 
than the standard log-Erobenius distance. 

- Distance learning with original covariance matrices or Cholesky decompositions 
performs poorly compared to distance learning in the logarithm domain. 
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Table 4: Prediction accuracy on LFW dataset using various SPD matrix distances 


Frobenius 

Cholesky- 

Frobenius 

Log-Frobenius 

J-divergence 

Jenson-Bregman 
LogDet divergence 

Affine 

invariant 

53.77 

56.62 

60.43 

60.92 

61.62 

61.15 


Table 5: Prediction accuracy on LFW dataset using distance learning 


Covariance matrices 

Cholesky decompositions 

Log-Euclidean 

Frobenius 

ITML 

ITML gain 

Frobenius 

ITML 

ITML gain 

Frobenius 

ITML 

ITML gain 

53.77 

57.58 

3.81 

56.62 

63.53 

6.91 

60.43 

69.37 

8.94 


5.2 Semi-supervised clustering using ETH80 object dataset 

In this experiment, we are interested in clustering the images in the ETH80 dataset into 
different object categories. 


Dataset The ETH80 object dataset [10] consists of 256 x 256 images of 8 object cate¬ 
gories with each category including 10 different object instances. Each object instance 
has 41 images captured under different views. So, each object category has 410 images 
resulting in a total of 3280 images. 


Feature extraction We convert each pixel in an image into a 9-dimensional feature 
vector given by 


X, 2 /, R{x,y), G{x,y), B{x,y), 


dW{x,y) 


dW{x,y) 

dx 

5 

dy 


d‘^W{x,y) 


d‘^W{x,y) 

dx'^ 

5 

dy^ \ 


where x, y are the column and row coordinates respectively, R, G and B are the color 
coordinates and W is the grayscale image. We compute the 9x9 covariance matrix of 
the feature vectors over the entire image and use it to represent the image. 


Experimental protocol and parameters Eor every object category, we randomly se¬ 
lect 4 images from each instance for training. Hence, we use 40 samples from each ob¬ 
ject category for training, resulting in a total of 320 training images. Erom each pair of 
training images, we generate either a similarity constraint or a dissimilarity constraints 
based on their category labels. We use all such constraints in learning the Mahalanobis 
distance function. Once we learn the Mahalanobis distance function, we use it for clus¬ 
tering the entire dataset of 3280 images. 

We repeat the above procedure 5 times and report the average clustering accuracy. In 
each run, we select the value of ITML parameter 7 using two fold cross-validation on 
the training data. We use the following values for other ITML parameters in all the 5 
runs: Mq = I, a = 5, b = 95. 

We use K-means algorithm for clustering. To handle the local-optimum issue, we run 


































Riemannian Metric Learning for SPD Matrices 


9 


K-means with 20 different random initializations and select the clustering result corre¬ 
sponding to the minimum K-means cost value. 


Comparative methods We compare the performance of the proposed log-Euclidean 
metric learning approach with the following approaches: 

- Unsupervised: Directly perform K-means clustering using any of the following 
distances: Frobenius, Cholesky-Frobenius and Log-Frobenius. 

- Use ITML directly with the covariance matrices by treating them as elements of 
the Euclidean space of symmetric matrices. 

- Use ITML with the lower triangular matrix obtained by Cholesky decomposition. 

Computation of mean doesn’t have a closed form solution in the case of J-divergence or 
Jensen-Bregman LogDet divergence or Affine-invariant distance. Hence, we need to use 
some optimization procedure for computing the mean. This makes K-means algorithm 
highly computational. Hence, we do not use these distances for comparison in this work. 


Results Table 6 summarizes the clustering results for various approaches on the ETH80 
dataset. We can draw the following conclusions from these results: 

- The proposed Riemannian metric/geodesic distance learning approach performs 
better than other approaches for clustering SPD matrices. 

- The log-Euclidean geodesic distance learned from the data performs much better 
than the standard log-Frobenius distance. 

- Distance learning with original covariance matrices or Cholesky decompositions 
performs poorly compared to distance learning in the logarithm domain. 


6 Conclusion 

In this work, we have explored the idea of data-driven Riemannian metrics or geodesic 
distances. Based on the log-Euclidean framework [6], we have shown how geodesic dis¬ 
tance functions can be learned for by simply learning Mahalanobis distance func¬ 
tions in the logarithm domain. We have conducted experiments using face and object 
data sets. The face matching and semi-supervised object categorization results clearly 
show that the learned log-Euclidean geodesic distance performs much better than other 
distances. 


Table 6: Clustering accuracy on ETH80 dataset 


Covariance matrices 

Cholesky decompositions 

Log-Euclidean 

Frobenius 

ITML 

ITML gain 

Frobenius 

ITML 

ITML gain 

Frobenius 

ITML 

ITML gain 

35.58 

70.50 

34.92 

51.13 

70.36 

19.24 

55.70 

73.79 

18.09 
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